Single cell RNA-seq data is simulated to represent a situation in which 2 groups of cells generated through some experimental procedure are found to have heterogenous expression in a number of genes. Both groups also possess genes that are differentially expressed compared to those of a group of control cells.
We will show that the two groups of cells subjected to the experimental procedure are indistinguishable when subjected to dimension reduction techniques that do not take into account the information stored in the control cells.
# simulate the three groups of cells such that cell heterogeneity is masked by
# some batch effect
params <- newSplatParams(
seed = 6757293,
nGenes = 500,
batchCells = c(150, 150),
batch.facLoc = c(0.05, 0.05),
batch.facScale = c(0.05, 0.05),
group.prob = rep(1/3, 3),
de.prob = c(0.1, 0.05, 0.1),
de.downProb = c(0.1, 0.05, 0.1),
de.facLoc = rep(0.2, 3),
de.facScale = rep(0.2, 3)
)
sim_groups_sce <- splatSimulate(params, method = "groups")
# get the logcounts of the data
sim_groups_sce <- normalize(sim_groups_sce)
# remove all cells without variation in counts
sim_groups_sce <- sim_groups_sce[which(rowVars(counts(sim_groups_sce)) != 0), ]
We take the first two principal components of the entire dataset to illustrate that the variance caused by the batch effect dominates all other signals in the data.
Now, we focus on applying variuos dimension reduction techniques to the target data, i.e. the cells that were subjected to some experimental procedure. The transcriptome data belonging to the control cells is used as a background dataset for cPCA and scPCA.
| Gene | DEFacGroup1 | DEFacGroup3 | diff | scPCA1 |
|---|---|---|---|---|
| Gene327 | 1.0000000 | 2.2944560 | 1.2944560 | 1 |
| Gene201 | 2.0011872 | 1.0000000 | 1.0011872 | 0 |
| Gene346 | 1.0000000 | 1.9037754 | 0.9037754 | 1 |
| Gene240 | 1.0000000 | 1.7944498 | 0.7944498 | 0 |
| Gene128 | 1.7802675 | 1.0000000 | 0.7802675 | 1 |
| Gene473 | 1.0000000 | 1.7703224 | 0.7703224 | 0 |
| Gene214 | 1.6983572 | 1.0000000 | 0.6983572 | 0 |
| Gene188 | 1.0000000 | 1.6860079 | 0.6860079 | 1 |
| Gene307 | 1.6467488 | 1.0000000 | 0.6467488 | 1 |
| Gene192 | 1.6426308 | 1.0000000 | 0.6426308 | 1 |
| Gene44 | 1.6113054 | 1.0000000 | 0.6113054 | 1 |
| Gene454 | 1.0000000 | 1.5922447 | 0.5922447 | 1 |
| Gene270 | 1.0000000 | 1.5701455 | 0.5701455 | 0 |
| Gene304 | 1.0000000 | 1.5430068 | 0.5430068 | 0 |
| Gene190 | 1.0000000 | 1.5176864 | 0.5176864 | 1 |
| Gene383 | 0.4846386 | 1.0000000 | 0.5153614 | 1 |
| Gene158 | 1.5126046 | 1.0000000 | 0.5126046 | 1 |
| Gene8 | 1.5483328 | 1.0770052 | 0.4713276 | 1 |
| Gene370 | 1.4712158 | 1.0000000 | 0.4712158 | 1 |
| Gene364 | 1.4459099 | 1.0000000 | 0.4459099 | 1 |
| Gene66 | 1.0000000 | 1.4211804 | 0.4211804 | 1 |
| Gene68 | 1.0000000 | 1.3941840 | 0.3941840 | 1 |
| Gene10 | 1.3941636 | 1.0000000 | 0.3941636 | 0 |
| Gene54 | 1.0000000 | 1.3759716 | 0.3759716 | 1 |
| Gene315 | 1.0000000 | 1.3691284 | 0.3691284 | 0 |
| Gene3 | 1.3611369 | 1.0000000 | 0.3611369 | 0 |
| Gene135 | 1.3587745 | 1.0000000 | 0.3587745 | 0 |
| Gene334 | 1.0000000 | 0.6421265 | 0.3578735 | 0 |
| Gene196 | 1.3468610 | 1.0000000 | 0.3468610 | 1 |
| Gene245 | 1.0000000 | 1.3271180 | 0.3271180 | 0 |
| Gene220 | 1.3079478 | 1.0000000 | 0.3079478 | 0 |
| Gene342 | 1.0000000 | 1.2988780 | 0.2988780 | 0 |
| Gene380 | 1.2972411 | 1.0000000 | 0.2972411 | 0 |
| Gene228 | 1.2931549 | 1.0000000 | 0.2931549 | 0 |
| Gene363 | 0.7128209 | 1.0000000 | 0.2871791 | 0 |
| Gene229 | 1.2861742 | 1.0000000 | 0.2861742 | 0 |
| Gene80 | 0.7147228 | 1.0000000 | 0.2852772 | 0 |
| Gene100 | 1.0000000 | 1.2837038 | 0.2837038 | 0 |
| Gene338 | 1.2741687 | 1.0000000 | 0.2741687 | 0 |
| Gene275 | 1.0000000 | 1.2706184 | 0.2706184 | 0 |
| Gene108 | 1.0000000 | 1.2704223 | 0.2704223 | 0 |
| Gene436 | 0.7300889 | 1.0000000 | 0.2699111 | 0 |
| Gene143 | 1.2692307 | 1.0000000 | 0.2692307 | 0 |
| Gene254 | 1.2596478 | 1.0000000 | 0.2596478 | 0 |
| Gene353 | 1.0000000 | 1.2574056 | 0.2574056 | 0 |
| Gene489 | 1.0000000 | 1.2499518 | 0.2499518 | 0 |
| Gene285 | 1.2458224 | 1.0000000 | 0.2458224 | 1 |
| Gene103 | 1.2402441 | 1.0000000 | 0.2402441 | 0 |
| Gene482 | 1.0000000 | 1.2327516 | 0.2327516 | 0 |
| Gene258 | 1.0000000 | 0.7857538 | 0.2142462 | 0 |
| Gene218 | 1.0000000 | 1.2088149 | 0.2088149 | 0 |
| Gene458 | 1.0000000 | 1.1934206 | 0.1934206 | 0 |
| Gene235 | 1.1802687 | 1.0000000 | 0.1802687 | 1 |
| Gene197 | 1.0000000 | 0.8232851 | 0.1767149 | 0 |
| Gene453 | 1.0000000 | 1.1731249 | 0.1731249 | 0 |
| Gene36 | 1.0000000 | 0.8368966 | 0.1631034 | 0 |
| Gene28 | 1.0000000 | 1.1586270 | 0.1586270 | 0 |
| Gene193 | 1.0000000 | 1.1564362 | 0.1564362 | 0 |
| Gene55 | 1.1393105 | 1.2937132 | 0.1544027 | 0 |
| Gene302 | 1.0000000 | 1.1483239 | 0.1483239 | 0 |
| Gene238 | 1.1441278 | 1.0000000 | 0.1441278 | 0 |
| Gene30 | 1.0000000 | 1.1425416 | 0.1425416 | 0 |
| Gene75 | 1.0000000 | 1.1370305 | 0.1370305 | 0 |
| Gene11 | 1.0000000 | 1.1324617 | 0.1324617 | 0 |
| Gene424 | 1.1310105 | 1.0000000 | 0.1310105 | 0 |
| Gene70 | 1.0000000 | 0.8755727 | 0.1244273 | 0 |
| Gene169 | 1.0000000 | 1.1230450 | 0.1230450 | 0 |
| Gene405 | 1.1723449 | 1.0548861 | 0.1174588 | 0 |
| Gene250 | 1.0000000 | 0.8854873 | 0.1145127 | 0 |
| Gene46 | 1.0000000 | 1.1050231 | 0.1050231 | 0 |
| Gene145 | 1.0000000 | 1.0937630 | 0.0937630 | 0 |
| Gene374 | 1.0918226 | 1.0000000 | 0.0918226 | 0 |
| Gene399 | 1.0000000 | 1.0807069 | 0.0807069 | 0 |
| Gene484 | 1.0773906 | 1.0000000 | 0.0773906 | 0 |
| Gene475 | 1.0760033 | 1.0000000 | 0.0760033 | 0 |
| Gene222 | 1.0737436 | 1.0000000 | 0.0737436 | 0 |
| Gene202 | 1.6427858 | 1.7156525 | 0.0728667 | 0 |
| Gene278 | 1.0726935 | 1.0000000 | 0.0726935 | 0 |
| Gene132 | 1.0000000 | 1.0715175 | 0.0715175 | 0 |
| Gene468 | 1.0000000 | 0.9316385 | 0.0683615 | 0 |
| Gene292 | 1.0627392 | 1.0000000 | 0.0627392 | 0 |
| Gene126 | 1.0621330 | 1.0000000 | 0.0621330 | 0 |
| Gene239 | 1.0000000 | 1.0619006 | 0.0619006 | 0 |
| Gene116 | 1.0000000 | 1.0596130 | 0.0596130 | 0 |
| Gene231 | 1.0000000 | 1.0546435 | 0.0546435 | 0 |
| Gene118 | 1.0474613 | 1.0000000 | 0.0474613 | 0 |
| Gene256 | 1.0406122 | 1.0000000 | 0.0406122 | 0 |
| Gene227 | 1.0000000 | 1.0400291 | 0.0400291 | 0 |
| Gene455 | 1.0373562 | 1.0000000 | 0.0373562 | 0 |
| Gene347 | 1.3211550 | 1.2865117 | 0.0346433 | 0 |
| Gene403 | 1.0000000 | 1.0307340 | 0.0307340 | 0 |
| Gene309 | 1.0181823 | 1.0000000 | 0.0181823 | 0 |
| Gene416 | 1.1885126 | 1.1734931 | 0.0150194 | 0 |
| Gene461 | 1.0000000 | 1.0108062 | 0.0108062 | 0 |
| Gene291 | 1.0089752 | 1.0000000 | 0.0089752 | 0 |
| Gene396 | 1.0000000 | 1.0071671 | 0.0071671 | 0 |
| Gene123 | 1.0041118 | 1.0000000 | 0.0041118 | 0 |
| Gene434 | 1.0000000 | 1.0029095 | 0.0029095 | 0 |
Of the 98 differentially expressed genes, scPCA identified the most prominent. Of the 20 genes with non-zero values in the first row of scPCA’s loading matrix, 20 corresponded to differentially expressed genes.